Illustrations Segmentation in Digitized Documents Using Local Correlation Features

نویسندگان

  • Dalia Coppi
  • Costantino Grana
  • Rita Cucchiara
چکیده

In this paper we propose an approach for Document Layout Analysis based on local correlation features. We identify and extract illustrations in digitized documents by learning the discriminative patterns of textual and pictorial regions. The proposal has been demonstrated to be effective on historical datasets and to outperform the state-of-the-art in presence of challenging documents with a large variety of pictorial elements. c © 2014 The Authors. Published by Elsevier B.V. Peer-review under responsibility of the Scientific Committee of IRCDL 2014.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Layout Analysis and Content Classification in Digitized Books

Automatic layout analysis has proven to be extremely important in the process of digitization of large amounts of documents. In this paper we present a mixed approach to layout analysis, introducing a SVM-aided layout segmentation process and a classification process based on local and geometrical features. The final output of the automatic analysis algorithm is a complete and structured annota...

متن کامل

Skew detection and text line position determination in digitized documents

-This paper proposes a computationally efficient procedure for skew detection and text line position determination in digitized documents, which is based on the cross-correlation between the pixels of vertical lines in a document. The determination of the skew angle in documents is essential in optical character recognition systems. Due to the text skew, each horizontal text line intersects a p...

متن کامل

Segmentation et classification dans les images de documents numérisés. (Segmentation and classification of digitized document images)

In this thesis, we deal with printed document images processing and analysis to automate the press reviews. The scanner output images are processed without any prior knowledge nor human intervention. Thus, to characterize them, we present a scalable analysis system for complex documents. This characterization is based on a hybrid color segmentation suited to noisy document images. The color ana...

متن کامل

A Search Engine for Handwritten Documents

The design and functionality of a versatile search engine on handwritten documents is described. Documents are indexed using global image features, e.g., stroke width, slant, word gaps, as well local features that describe shapes of characters and words. Image indexing is done automatically using page analysis, page segmentation, line separation, word segmentation and recognition of characters ...

متن کامل

"Étude comparative de trois ensembles de descripteurs de texture pour la segmentation de documents anciens"

Recently, texture-based features have been used for digitized historical document image segmentation. It has been proven that these methods work effectively with no a priori knowledge. Moreover, it has been shown that they are robust when they are applied on degraded documents under different noise levels and kinds. In this paper an approach of evaluating CIFED 2014, pp. 41–56, Nancy, 18-21 mar...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014